{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "\n",
    "In this chapter we will start to perform analysis for all cities rather than just one. Say for example counting the number of museums for each city. In the process we will learn some Advanced Spark Dataframe concepts.\n",
    "\n",
    "## Advanced Dataframe API\n",
    "\n",
    "Most of the time in Spark SQL you can use Strings (in SQL) to reference columns but there are two cases where  you’ll want to use the Column objects rather than Strings :\n",
    "\n",
    "* In Spark SQL DataFrame columns are allowed to have the same name, they’ll be given unique names inside of Spark SQL, but this means that you can’t reference them with the column name only as this becomes ambiguous.\n",
    "\n",
    "* When you need to manipulate columns using expressions like Adding two columns to each other, Twice the value of this column or even \"Is the column value larger than 0?\", you won’t be able to use simple strings and will need the Column reference.\n",
    "\n",
    "* Finally if you need renaming, cast or any other complex feature, you’ll need the Column reference too.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Joins, UDFs, Broadcast Variables, Spatial Predicate\n",
    "\n",
    "## Joins\n",
    "Joining data is an important part of many of our pipelines, and both Spark core and Spark SQL support the same fundamental types of joins. While joins are very common and powerful, they warrant special performance consideration as they may require large network transfers or even create data sets beyond our capability to handle. In core Spark it can be more important to think about the ordering of operations, since the DAG optimizer, unlike the SQL optimizer isn’t able to re-order or push down filters.\n",
    "\n",
    "## User-Defined Functions (aka UDF)\n",
    "\n",
    "User-Defined Functions (aka UDF) is a feature of Spark SQL to define new Column-based functions that extend the vocabulary of Spark SQL’s DSL to transform Datasets.\n",
    "\n",
    "## Broadcast variables \n",
    "\n",
    "Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. Explicitly creating broadcast variables is only useful when tasks across multiple stages need the same data or when caching the data in deserialized form is important."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise: for each city count the number of museums\n",
    "\n",
    "In this Exercise for each city we will count the number of museums and return a Dataframe with: city_name, museum_count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Populating the interactive namespace from numpy and matplotlib\n"
     ]
    }
   ],
   "source": [
    "# %load './code/helpers/imports.py'\n",
    "import notebook\n",
    "import os.path, json, io, pandas\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "%pylab inline\n",
    "pylab.rcParams['figure.figsize'] = (54, 60)\n",
    "\n",
    "\n",
    "from retrying import retry # for exponential back down when calling TurboOverdrive API\n",
    "\n",
    "import pyspark.sql.functions as func # resuse as func.coalace for example\n",
    "from pyspark.sql.types import StringType, IntegerType, FloatType, DoubleType,DecimalType\n",
    "\n",
    "import pandas as pandas\n",
    "from geopandas import GeoDataFrame # Loading boundaries Data\n",
    "from shapely.geometry import Point, Polygon, shape # creating geospatial data\n",
    "from shapely import wkb, wkt # creating and parsing geospatial data\n",
    "import overpy # OpenStreetMap API\n",
    "\n",
    "from ast import literal_eval as make_tuple # used to decode data from java\n",
    "\n",
    "# make sure nbextensions are installed\n",
    "notebook.nbextensions.check_nbextension('usability/codefolding', user=True)\n",
    "\n",
    "try:\n",
    "    sc\n",
    "except NameError:\n",
    "    import pyspark\n",
    "    sc = pyspark.SparkContext('local[*]')\n",
    "    sqlContext = pyspark.sql.SQLContext(sc)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# %load './code/helpers/load_boundaries_and_pois.py'\n",
    "OVERPASS_API         = overpy.Overpass()\n",
    "BASE_DIR             = os.path.join(os.path.abspath('.'), 'work-flow')\n",
    "URBAN_BOUNDARIES_FILE = '06_Europe_Cities_Boundaries_with_Labels_Population.geo.json'\n",
    "\n",
    "# Paths to base datasets that we are using:\n",
    "URBAN_BOUNDARIES_PATH = os.path.join(BASE_DIR,URBAN_BOUNDARIES_FILE)\n",
    "POIS_PATH            = os.path.join(BASE_DIR, \"pois.json\")\n",
    "\n",
    "try:\n",
    "    geo_df\n",
    "except NameError:\n",
    "    geo_df = GeoDataFrame.from_file(URBAN_BOUNDARIES_PATH)\n",
    "    # Add a WKT column for use later\n",
    "    geo_df['wkt'] = pandas.Series(\n",
    "        map(lambda geom: str(geom.to_wkt()), geo_df['geometry']),\n",
    "        index=geo_df.index, dtype='string')\n",
    "\n",
    "try:\n",
    "    boundaries_from_pd\n",
    "except NameError:\n",
    "    boundaries_from_pd = sqlContext.createDataFrame(geo_df)\n",
    "    boundaries_from_pd.registerTempTable(\"boundaries\")\n",
    "\n",
    "try:\n",
    "    pois_df\n",
    "except NameError:\n",
    "    pois_df = sqlContext.read.json(POIS_PATH)\n",
    "    pois_df = pois_df.toPandas()\n",
    "    def toWktColumn(coords):\n",
    "        return (Point(coords).wkt)\n",
    "\n",
    "    pois_df['wkt'] = pandas.Series(\n",
    "        map(lambda geom: toWktColumn(geom.coordinates), pois_df['geometry']),\n",
    "        index=pois_df.index, dtype='string')\n",
    "\n",
    "    pois_df = sqlContext.createDataFrame(pois_df)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# -------------------------------------- Solution below -------------------------------------- #\n",
    "\n",
    "# Hint: We want to do a join based on within spatial predicate (from shapely)\n",
    "# Hint: Get a dataframe with city name and city geometry\n",
    "# Hint: when you have the join perform a group by\n",
    "\n",
    "# Some Useful references to documentation\n",
    "# # http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.join\n",
    "# # http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext.registerFunction\n",
    "# # http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.udf\n",
    "# # http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.cast\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"100.0\" height=\"100.0\" viewBox=\"8.44096254054 47.3131580402 0.190907821406 0.128784319976\" preserveAspectRatio=\"xMinYMin meet\"><g transform=\"matrix(1,0,0,-1,0,94.7551004004)\"><g><path fill-rule=\"evenodd\" fill=\"#66cc99\" stroke=\"#555555\" stroke-width=\"0.00381815642811\" opacity=\"0.6\" d=\"M 8.51073070064,47.4321856996 L 8.5246681998,47.4311641999 L 8.52792424989,47.4326966806 L 8.54355820194,47.4313432 L 8.56083770048,47.4184912002 L 8.56327356287,47.4156632294 L 8.57378618013,47.4113305375 L 8.58770695385,47.407749793 L 8.59636970013,47.4056056999 L 8.594696201,47.3992081997 L 8.59148070168,47.3970187004 L 8.58817119975,47.3957447 L 8.58434520098,47.3918312003 L 8.58305470172,47.3890491999 L 8.58436120054,47.3883292 L 8.5851962018,47.3903277 L 8.5869666988,47.3901596998 L 8.59307370002,47.3856432004 L 8.59720520051,47.3784982 L 8.60363484106,47.3704819749 L 8.61199420077,47.3669092004 L 8.62166820051,47.3599741997 L 8.62479970189,47.3543931997 L 8.6220897007,47.3534011997 L 8.6051131999,47.3536037 L 8.58677770185,47.3528596999 L 8.5864354184,47.352688935 L 8.58451057778,47.3517285918 L 8.56990702022,47.3467946792 L 8.56574596328,47.3467780497 L 8.56250970122,47.3456917005 L 8.56149420162,47.3473436998 L 8.55820569994,47.3507997 L 8.5563412017,47.3521042001 L 8.55120469974,47.3526537002 L 8.55048569934,47.3533747 L 8.54713070159,47.3588337002 L 8.545538288,47.3639849772 L 8.53955331661,47.364938249 L 8.53629265028,47.3521106463 L 8.54340170119,47.3360942003 L 8.53623520009,47.3301472003 L 8.53265970044,47.3278657003 L 8.52974519821,47.3260612002 L 8.52375020113,47.3242416997 L 8.50311869945,47.3202287003 L 8.50062969947,47.3226967 L 8.50062369878,47.3311082 L 8.50116369968,47.3352741999 L 8.50202970008,47.3365517003 L 8.4976237013,47.3420911997 L 8.48689120125,47.3526882002 L 8.48202213556,47.3593225016 L 8.47101420108,47.3615112002 L 8.46426969911,47.3667376998 L 8.46415819949,47.3683702 L 8.46985165658,47.3739106058 L 8.4675942019,47.3735771996 L 8.45054820109,47.3783301999 L 8.44803320059,47.3801996998 L 8.45758720155,47.3836557003 L 8.46734620155,47.3930667002 L 8.46706556474,47.4018697238 L 8.47813170148,47.4038737001 L 8.47724070188,47.4119377002 L 8.4739331988,47.4122277002 L 8.4727222,47.4106867 L 8.46807869985,47.4133762002 L 8.46919819812,47.4169086996 L 8.47785369948,47.4221192001 L 8.48175570114,47.4230121998 L 8.48853519976,47.4226992001 L 8.49021440629,47.4225576846 L 8.48474119998,47.4281771996 L 8.48547469967,47.4310416998 L 8.50146770072,47.4348717001 L 8.51073070064,47.4321856996 z\" /></g></g></svg>"
      ],
      "text/plain": [
       "<shapely.geometry.multipolygon.MultiPolygon at 0x10c83d690>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Exercise\n",
    "# for each city count the number of museums\n",
    "# and return a DF with:\n",
    "# city_name, museum_count\n",
    "\n",
    "# SQL Version\n",
    "# cities_df = sqlContext.sql(\n",
    "#     \"\"\"\n",
    "#     SELECT properties.NAMEASCII AS city_name, \n",
    "#         geometry AS city_geom FROM boundaries\n",
    "#     \"\"\")\n",
    "\n",
    "cities_df = boundaries_from_pd.select(\n",
    "    (boundaries_from_pd.NAMEASCII).alias('city_name'),\n",
    "    (boundaries_from_pd.wkt).alias('city_geom'))\n",
    "\n",
    "cities_df.cache()\n",
    "\n",
    "# Create a broadcast variable\n",
    "# Broadcast http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables\n",
    "# Broadcast variables allow the programmer to keep a read-only variable cached on each machine \n",
    "# rather than shipping a copy of it with tasks. They can be used, for example, to give every node \n",
    "# a copy of a large input dataset in an efficient manner. Spark also attempts to distribute broadcast\n",
    "# variables using efficient broadcast algorithms to reduce communication cost.\n",
    "\n",
    "# Spark actions are executed through a set of stages, separated by distributed “shuffle” \n",
    "# operations. Spark automatically broadcasts the common data needed by tasks within each stage. \n",
    "# The data broadcasted this way is cached in serialized form and deserialized before running each task. \n",
    "# This means that explicitly creating broadcast variables is only useful when tasks across multiple \n",
    "# stages need the same data or when caching the data in deserialized form is important.\n",
    "\n",
    "# Broadcast variables are created from a variable v by calling SparkContext.broadcast(v). The broadcast \n",
    "# variable is a wrapper around v, and its value can be accessed by calling the value method.\n",
    "\n",
    "_cities_df = cities_df.toJSON().collect()[0:3]\n",
    "broadcastCitiesJSON = sc.broadcast(_cities_df)\n",
    "\n",
    "wkt.loads(json.loads(broadcastCitiesJSON.value[0])['city_geom'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8783\n",
      "41\n",
      "+--------------------+--------------------+---------+\n",
      "|         museum_geom|         museum_name|city_name|\n",
      "+--------------------+--------------------+---------+\n",
      "|[WrappedArray(8.5...|           Schiffbau|   Zurich|\n",
      "|[WrappedArray(8.4...|Neues Theater Spi...|   Zurich|\n",
      "|[WrappedArray(8.5...|Ortsmuseum Wollis...|   Zurich|\n",
      "|[WrappedArray(8.5...|      Kriminalmuseum|   Zurich|\n",
      "|[WrappedArray(8.5...|Museum für Gestal...|   Zurich|\n",
      "|[WrappedArray(8.5...|      Lebewohlfabrik|   Zurich|\n",
      "|[WrappedArray(8.5...|Piccolo Commedia ...|   Zurich|\n",
      "|[WrappedArray(8.5...|Theater am Hechtp...|   Zurich|\n",
      "|[WrappedArray(8.5...|Museum Porzellan ...|   Zurich|\n",
      "|[WrappedArray(8.5...|        Theater Stok|   Zurich|\n",
      "|[WrappedArray(8.5...|Kindertheater Purpur|   Zurich|\n",
      "|[WrappedArray(8.5...|   Zivilschutzmuseum|   Zurich|\n",
      "|[WrappedArray(8.5...|Alfred Escher-Statue|   Zurich|\n",
      "|[WrappedArray(8.5...|       Sogar Theater|   Zurich|\n",
      "|[WrappedArray(8.5...| Theater am Neumarkt|   Zurich|\n",
      "|[WrappedArray(8.5...| Theater Stadelhofen|   Zurich|\n",
      "|[WrappedArray(8.5...|          ComedyHaus|   Zurich|\n",
      "|[WrappedArray(8.5...|      Maag MusicHall|   Zurich|\n",
      "|[WrappedArray(8.5...|         Kulturmarkt|   Zurich|\n",
      "|[WrappedArray(8.5...|    Tanzhaus Zuerich|   Zurich|\n",
      "+--------------------+--------------------+---------+\n",
      "only showing top 20 rows\n",
      "\n"
     ]
    }
   ],
   "source": [
    "def get_city_name(poi_geom):\n",
    "    # get an array of dict [(city_name, city_geom)]\n",
    "    cities = map(lambda c: {\n",
    "                    'city_name': json.loads(c)['city_name'],\n",
    "                    'city_wkt': wkt.loads(json.loads(c)['city_geom'])\n",
    "                }, broadcastCitiesJSON.value)\n",
    "\n",
    "    shply_poi = shape(poi_geom.asDict())\n",
    "    city = filter(lambda city: shply_poi.within(city['city_wkt']), cities)\n",
    "    name = None\n",
    "    if city:\n",
    "        name = city[0]['city_name']\n",
    "    return name\n",
    "\n",
    "# http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext.registerFunction\n",
    "# Registers a python function (including lambda function) as a UDF so it can be used in SQL statements.\n",
    "# In addition to a name and the function itself, the return type can be optionally specified. When the \n",
    "# return type is not given it default to a string and conversion will automatically be done. For any other \n",
    "# return type, the produced object must match the specified type.\n",
    "\n",
    "sqlContext.udf.register(\"get_city_name\", get_city_name, StringType())\n",
    "\n",
    "# http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.udf\n",
    "# Creates a Column expression representing a user defined function (UDF).    \n",
    "get_city_name_udf = func.udf(get_city_name, StringType())\n",
    "\n",
    "# SQL VERSION\n",
    "# museums_df = sqlContext.sql(\n",
    "#     \"SELECT geometry as museum_geom, \\\n",
    "#     properties.name as museum_name, \\\n",
    "#     get_city_name(geometry) as city_name \\\n",
    "#     FROM pois WHERE properties.tourism = 'museum'\")\n",
    "\n",
    "\n",
    "museums_df = pois_df.select(\n",
    "    (pois_df['geometry']).alias('museum_geom'),\n",
    "    (pois_df['properties']['name']).alias('museum_name'),\n",
    "    (get_city_name_udf(pois_df['geometry']).alias('city_name'))\n",
    ")\n",
    "\n",
    "museums_df.registerTempTable(\"museums\")\n",
    "                       \n",
    "print museums_df.count()\n",
    "print cities_df.count()\n",
    "museums_df.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "493\n"
     ]
    }
   ],
   "source": [
    "# museums_df.cache() # Try without and with\n",
    "print museums_df.where(museums_df.city_name.isNotNull()).count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adding population column\n",
    "According to our algorithm we will have to divide count by the the population to scale it per capita. Lets try to run our algorithm on a subset of the data to get practice for the real deal!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DataFrame[city_name: string, population: int, city_geom: string]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# lets recreate the cities DF this time including the population\n",
    "# http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.cast\n",
    "# Convert the column into type dataType\n",
    "cities_df = boundaries_from_pd.select(\n",
    "    (boundaries_from_pd.NAMEASCII).alias('city_name'),\n",
    "    (boundaries_from_pd.POPEU2013.cast(IntegerType())).alias('population'),\n",
    "    (boundaries_from_pd.wkt).alias('city_geom'))\n",
    "\n",
    "cities_df.cache()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "root\n",
      " |-- museum_geom: struct (nullable = true)\n",
      " |    |-- coordinates: array (nullable = true)\n",
      " |    |    |-- element: double (containsNull = true)\n",
      " |    |-- type: string (nullable = true)\n",
      " |-- museum_name: string (nullable = true)\n",
      " |-- city_name: string (nullable = true)\n",
      " |-- population: integer (nullable = true)\n",
      " |-- city_geom: string (nullable = true)\n",
      "\n",
      "root\n",
      " |-- city_name: string (nullable = true)\n",
      " |-- population: integer (nullable = true)\n",
      " |-- count: long (nullable = false)\n",
      "\n",
      "+---------+----------+-----+\n",
      "|city_name|population|count|\n",
      "+---------+----------+-----+\n",
      "|   Zurich|    380777|   71|\n",
      "|   Prague|   1243201|  305|\n",
      "|    Sofia|   1213542|   41|\n",
      "+---------+----------+-----+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "museums_df = museums_df.dropna()\n",
    "\n",
    "# http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.join\n",
    "# Joins with another DataFrame, using the given join expression.\n",
    "\n",
    "# The following performs a full outer join between df1 and df2.\n",
    "\n",
    "# Parameters:\n",
    "# other – Right side of the join\n",
    "# on – a string for join column name, a list of column names, , a join expression (Column) or a list of Columns.\n",
    "#      If on is a string or a list of string indicating the name of the join column(s), the column(s) must exist \n",
    "#      on both sides, and this performs an equi-join.\n",
    "# how – str, default ‘inner’. One of inner, outer, left_outer, right_outer, leftsemi.\n",
    "\n",
    "df = museums_df.join(cities_df, museums_df.city_name == cities_df.city_name).select(\n",
    "    museums_df.museum_geom,\n",
    "    museums_df.museum_name,    \n",
    "    museums_df.city_name,\n",
    "    cities_df.population,\n",
    "    cities_df.city_geom\n",
    ")\n",
    "\n",
    "df.printSchema( )\n",
    "# Love Spark!\n",
    "\n",
    "grouped_by_city = df.groupBy('city_name', 'population')\n",
    "grouped_by_city = grouped_by_city.count()\n",
    "grouped_by_city.printSchema()\n",
    "grouped_by_city.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# ------------------------------------------ End of Exercise -------------------------------------------- #\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculating the score\n",
    "\n",
    "Now that we have the population and count of museums for each city - it should be possible to workout our score. Lets first rename the count column to something more useful"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.withColumnRenamed\n",
    "\n",
    "grouped_by_city = grouped_by_city.withColumnRenamed('count', 'museum_count')\n",
    "\n",
    "cultural_weight_lookup = {\n",
    "    'museum': 1, \n",
    "    'gallery': 2, \n",
    "    'artwork': 3 \n",
    "}\n",
    "\n",
    "# http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.withColumn\n",
    "# Returns a new DataFrame by adding a column or replacing the existing column that has the same name.\n",
    "\n",
    "# Parameters:\n",
    "# colName – string, name of the new column.\n",
    "# col – a Column expression for the new column.\n",
    "\n",
    "cultural_score_expression = (\n",
    "        grouped_by_city['museum_count']/grouped_by_city['population']\n",
    "    )*cultural_weight_lookup['museum']*10000\n",
    "\n",
    "grouped_by_city = grouped_by_city.withColumn('cultural_score', \n",
    "                                             cultural_score_expression)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+---------+----------+------------+------------------+\n",
      "|city_name|population|museum_count|    cultural_score|\n",
      "+---------+----------+------------+------------------+\n",
      "|   Zurich|    380777|          71|1.8646084191009435|\n",
      "|   Prague|   1243201|         305| 2.453344229935465|\n",
      "|    Sofia|   1213542|          41|0.3378539844521244|\n",
      "+---------+----------+------------+------------------+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "grouped_by_city.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.9"
  },
  "toc": {
   "toc_cell": false,
   "toc_number_sections": true,
   "toc_threshold": 6,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
